*********************************************
* 		Do local tax preparers strengthen  	*
*		the influence of tax policy on 		*
*		individual behavior?				*
*											*
*		Madsen and Walton					*
*											*
* 		NTJ 2025							*	
* 		This code prepared by Madsen		*
*********************************************

clear
clear matrix
set more off

cd C:\Data

************************************************************************************************************************************************************************************************
************************************************************************************************************************************************************************************************
************************************************************************************************************************************************************************************************
********************* Organization *************************************************************************************************************************************************************
************************************************************************************************************************************************************************************************
************************************************************************************************************************************************************************************************
************************************************************************************************************************************************************************************************

*Input files used in the "organization" code below: 
	*several .xls files associated with appliance sales (not proprietary/legally restricted)
		*files containing raw "unit shipment sales data archives" data. These are the files describing energy star appliance sales. We downloaded ton this website: https://www.energystar.gov/partner-resources/products_partner_resources/brand-owner/unit-shipment-data/archives
	*TaxStateES2000.dta through 2009.dta
		*These are the energy star files produced by my first wave of data cleaning. 
	*StateFIPS.dta, SOIfips.dta, and ZipToFips.dta
		*Crosswalks linking state and county names and abbreviations to fips codes. 
	*IPUMSUSA2023.dta (proprietary/legally restricted but easily accessible to researchers who register with the IPUMS project (which is free and easy to do))
		*An extract from the IPUMS USA database. See https://usa.ipums.org/usa/. 
	*SOIwork2.dta
		*Data we produce using the raw (single year, non-harmonized) SOI files. 
		*These original files come from the "SOI tax stats – Individual income tax statistics – ZIP code data (SOI)" available here: https://www.irs.gov/statistics/soi-tax-stats-individual-income-tax-return-form-1040-statistics. 
		*We process them for use in this study and describe this process in Appendix A. 
	*sharpbunching.dta
		*Data from Chetty et al. (2013). It is available for download here: https://www.openicpsr.org/openicpsr/project/112681/version/V1/view.
	*zip3_state_crosswalk.dta
		*A crosswalk provided in the Chetty et al. (2013) replication files. 
		
*Output files produced by the "organization" code below: 
	*EnergyStarStateYear.dta
		*A file showing sales of energy star appliances by state-year that is the primary input to our H1 (model 1 and model 2) tests. 
	*TaxIPUMS_HO_sy.dta and TaxIPUMS_HO_cy.dta
		*Files showing home ownership rates by state-year (sy) and county-year(cy), control variable in model 1 and model 2. 
	*TaxIPUMSstate.dta and TaxIPUMScount.dta
		*Files showing characteristics, calculated from IPUMS microdata, for states and counties. 
	*SOIwork3.dta.dta, SOICounty.dta, and SOIState.dta
		*Files showing SOI tax return data at the zip code, county, and state levels. 
	
	
	*******************************************************************************************************************************************************
	*energy efficient AC***********************************************************************************************************************************	
	*******************************************************************************************************************************************************
		*Energy star data can be downloaded from the "unit shipment and sales data archives" on this website: https://www.energystar.gov/partner-resources/products_partner_resources/brand-owner/unit-shipment-data/archives
			import excel inputs\EnergyStar\2000_final.xls, sheet("Final 2000") cellrange(A6:C210) firstrow clear
			rename C atype
			rename A state
			reshape wide atype, i(state) j(B) string
			gen year = 2000
			save inputs\EnergyStar\TaxStateES2000.dta, replace

			import excel inputs\EnergyStar\2001_Sales_Data_Full_Draft.xls, sheet("2001 State") cellrange(A5:C205) firstrow clear
			rename PercentENERGYSTAR atype
			rename ApplianceType B
			rename State state
			order state B atype
			sort state B
			reshape wide atype, i(state) j(B) string
			gen year = 2001
			save inputs\EnergyStar\TaxStateES2001.dta, replace

			import excel inputs\EnergyStar\2002_sales_data_by_state.xls, sheet("Final Full Year") cellrange(A5:C209) firstrow clear
			rename PercentENERGYSTAR atype
			rename ApplianceType B
			rename State state
			order state B atype
			sort state B
			reshape wide atype, i(state) j(B) string
			gen year = 2002
			save inputs\EnergyStar\TaxStateES2002.dta, replace

			import excel inputs\EnergyStar\2003_appliance_sales_data.xls, sheet("State") cellrange(A3:D904) firstrow clear
			keep if Quarter == "ALL"
			drop Quarter
			rename PercentENERGYSTAR atype
			rename ApplianceType B
			rename State state
			order state B atype
			sort state B
			reshape wide atype, i(state) j(B) string
			gen year = 2003
			save inputs\EnergyStar\TaxStateES2003.dta, replace

			import excel inputs\EnergyStar\2004q4.xls, sheet("State") cellrange(A4:D905) firstrow clear
			keep if Quarter == "ALL"
			drop Quarter
			rename PercentENERGYSTAR atype
			rename ApplianceType B
			rename State state
			order state B atype
			sort state B
			reshape wide atype, i(state) j(B) string
			gen year = 2004
			save inputs\EnergyStar\TaxStateES2004.dta, replace

			import excel inputs\EnergyStar\2005fullyear.xls, sheet("State") cellrange(A4:D905) firstrow clear
			keep if Quarter == "ALL"
			drop Quarter
			rename PercentENERGYSTAR atype
			rename ApplianceType B
			rename State state
			order state B atype
			sort state B
			reshape wide atype, i(state) j(B) string
			gen year = 2005
			save inputs\EnergyStar\TaxStateES2005.dta, replace

			import excel inputs\EnergyStar\2006fullYear.xls, sheet("State") cellrange(A3:D904) firstrow clear
			keep if Quarter == "ALL"
			drop Quarter
			rename PercentENERGYSTAR atype
			rename ApplianceType B
			rename State state
			order state B atype
			sort state B
			reshape wide atype, i(state) j(B) string
			gen year = 2006
			save inputs\EnergyStar\TaxStateES2006.dta, replace

			import excel inputs\EnergyStar\2007FinalSalesData.xls, sheet("State") cellrange(A3:D904) firstrow clear
			keep if Quarter == "ALL"
			drop Quarter
			rename PercentENERGYSTAR atype
			rename ApplianceType B
			rename State state
			order state B atype
			sort state B
			reshape wide atype, i(state) j(B) string
			gen year = 2007
			save inputs\EnergyStar\TaxStateES2007.dta, replace

			import excel inputs\EnergyStar\2008FinalSalesData.xls, sheet("State") cellrange(A3:D905) firstrow clear
			keep if Quarter == "ALL"
			drop Quarter
			rename PercentENERGYSTAR atype
			rename ApplianceType B
			rename State state
			order state B atype
			sort state B
			reshape wide atype, i(state) j(B) string
			gen year = 2008
			foreach name in "AC" "CW" "DW" "RF" {
				destring atype`name', replace
			}
			save inputs\EnergyStar\TaxStateES2008.dta, replace

			import excel inputs\EnergyStar\2009FinalSalesData.xls, sheet("State") cellrange(A3:D1155) firstrow clear
			keep if Quarter == "All"
			drop Quarter
			rename PercentENERGYSTAR atype
			rename ApplianceType B
			rename State state
			order state B atype
			sort state B
			reshape wide atype, i(state) j(B) string
			gen year = 2009
			foreach name in "AC" "CW" "DW" "RF" "WH" {
				destring atype`name', replace
			}
			save inputs\EnergyStar\TaxStateES2009.dta, replace

		********* append energy star data together, end up with state-year energy star data.
			*append the annual energy star files together
				use inputs\EnergyStar\TaxStateES2000.dta, clear
				forvalues i = 1(1)9 {
					display "`i'"
					append using inputs\EnergyStar\TaxStateES200`i'.dta
				}

			*add state fips
				rename state abbr
				replace abbr = trim(abbr)
				merge m:1 abbr using inputs\StateFIPS.dta
				drop if _merge == 2
				drop _merge
				rename fips statefip
			
			save outputs\EnergyStarStateYear.dta, replace
				*atypeAC = air conditioners
				*atypeCW = clothes washers
				*atypeDW = dishwashers
				*atypeRF = refrigerators


	*******************************************************************************************************************************************************
	****IPUMS**********************************************************************************************************************************************
	*******************************************************************************************************************************************************
		
		*this first part gets a file of state-year rates of property ownership. 
		*I have to do this separately from the stuff below about occupations because this is state-year and that later stuff is not. 
			use inputs\IPUMS\IPUMSUSA2023.dta, clear
				gen homeownership = (ownershp == 1)
				collapse 	(mean) sy_homeownership=homeownership ///
							(sum) homeownN=homeownership  [pweight=perwt], by(statefip year) fast
				
				replace sy_homeownership = . if homeownN == 0
			save outputs\TaxIPUMS_HO_sy.dta, replace
			
			*do the same at the county level
			use inputs\IPUMS\IPUMSUSA2023.dta, clear
				gen homeownership = (ownershp == 1)
				replace countyfip = countyfip + (statefip * 1000)
				collapse 	(mean) cy_homeownership=homeownership ///
							(sum) homeownN=homeownership  [pweight=perwt], by(countyfip year) fast
				
				replace cy_homeownership = . if homeownN == 0
			save outputs\TaxIPUMS_HO_cy.dta, replace
			
		
		******this loop breaks the file apart (for RAM reasons on an old computer) and then cleans and organizes. 
			* It cleans up and summarizes
				*occupations
				*incomes and educations
				*other demographics
			* then it collapses this information at either the state-year or county-year level
		
			foreach scopee in "state" "county" {
				foreach timee in "1" "2" "3" {
					if `timee' == 1 {
						local filterrule = "year < 2005"
					}
					if `timee' == 2 {
						local filterrule = "year > 2004 & year < 2010"
					}
					if `timee' == 3 {
						local filterrule = "year > 2009"
					}
					
					use inputs\IPUMS\IPUMSUSA2023.dta, clear
								
					keep if `filterrule'
					
					foreach name in inctot ftotinc incwage {
						replace `name' = . if `name' == 9999999
						replace `name' = . if `name' == 999999
						replace `name' = . if `name' == 9999998
						replace `name' = . if `name' < 0
					}
					
					replace countyfip = countyfip + (statefip * 1000)

					gen taxexamine = (occ2010 == 930)
					gen taxprep = (occ2010 == 940)
					gen accountants = (occ2010 == 800)
					gen lawyers = (occ2010 == 2100)
					gen managers = 0
						replace managers = 1 if occ2010 > 9 & occ2010 < 431
						
					gen retailsales = 0
						replace retailsales = 1 if occ2010 > 4699 & occ2010 < 4761
						*this gives me first-line supervisors of sales workers, cashiers, counter and rental clerks, parts salespersons, and retail salespersons
					gen retailind = 0
						replace retailind = 1 if ind1990 > 579 & ind1990 < 583
						*this gives me lumber and building material retailing, hardware stores, retail nurseries and garden stores
						replace retailind = 1 if ind1990 == 631 
						replace retailind = 1 if ind1990 == 632
						*furniture and home furnishing stores, household appliance stores
						replace retailind = . if ind1990 == 0
					
					gen homedepot = 0
						replace homedepot = 1 if retailsales == 1 & retailind == 1
						replace homedepot = . if ind1990 == .
					
					gen highschool = 0
						replace highschool = 1 if educd > 61 & educd < 100
					gen bachelors = (educd == 101)
					gen masters = (educd == 114)
					gen professional = (educd == 115)
					gen doctorate = (educd == 116)

					gen unemployed = (empstat == 2)
					gen employed = (empstat == 1)
					
					gen married = (marst == 1)
						replace married = 1 if marst == 2
						
					gen child = (age < 18)
					
					svyset [pweight=perwt]
					
					bysort `scopee'fip year: egen pop = total(perwt)
					bysort `scopee'fip year: egen obss = count(perwt)
					
					*generate income categories to match SOI categories. 
					*This is going to differ from the weighting of the SOI. SOI is tax returns, which are often for families. My variable here is for individuals. 
					*I think this is the cleanest way to do it because I can least say with precision what I'm getting here. If I were to try to limit it to households, it would get messy in IPUMS. 					
						gen ipmsp0t25k = (ftotinc <= 25000 & ftotinc != .)
						gen ipmsp25t50k = (ftotinc <= 50000 & ftotinc > 25000)
						gen ipmsp50t75k = (ftotinc <= 75000 & ftotinc > 50000)
						gen ipmsp75t100k = (ftotinc <=100000 & ftotinc > 75000)
						gen ipmsp100t200k = (ftotinc <= 200000 & ftotinc > 100000)
						gen ipmsp200kplus = (ftotinc > 200000 & ftotinc != .)

					collapse 	(mean) taxexamine taxprep accountants lawyers managers homedepot highschool bachelors masters professional doctorate educ unemployed employed age married child ftotinc ipmsp0t25k ipmsp25t50k ipmsp50t75k ipmsp75t100k ipmsp100t200k ipmsp200kplus pop obss ///
								(sum) employedc=employed unemployedc=unemployed (max) ipums`scopee'N=obss  [pweight=perwt], by(`scopee'fip year) fast
					
					format pop %14.0f
					format obss %14.0f
					format ipums`scopee'N %14.0f
					format employedc %14.0f
					format unemployedc %14.0f

					save inputs\IPUMS\Tax`timee'`scopee'.dta, replace
				}
			}
			foreach scopee in "state" "county" {
				foreach timee in "1" "2" "3" {
					if `timee' == 1 {
						use inputs\IPUMS\Tax`timee'`scopee'.dta, clear
						erase inputs\IPUMS\Tax`timee'`scopee'.dta
					}
					if `timee' > 1 {
						append using inputs\IPUMS\Tax`timee'`scopee'.dta
							sort year `scopee'fip
						save outputs\TaxIPUMS`scopee'.dta, replace
						erase inputs\IPUMS\Tax`timee'`scopee'.dta
					}
				}
			}
			
			
	*******************************************************************************************************************************************************
	****SOI************************************************************************************************************************************************
	*******************************************************************************************************************************************************
			use inputs\SOI\SOIwork2.dta, clear
			
			*merge on state fips
				replace i1_state = i2_state if i1_state == ""
				replace i1_state = i3_state if i1_state == ""
				replace i1_state = i4_state if i1_state == ""
				replace i1_state = i5_state if i1_state == ""
				replace i1_state = i6_state if i1_state == ""
				replace i1_state = i7_state if i1_state == ""
				
				gen state = lower(i1_state)
				merge m:1 state using inputs\SOI\SOIfips.dta
					sort _merge
				drop _merge
				rename fips statefips

			*states without an income tax
				gen nostateincometax = 0
					replace nostateincometax = 1 if statefips == 56
					replace nostateincometax = 1 if statefips == 53
					replace nostateincometax = 1 if statefips == 48
					replace nostateincometax = 1 if statefips == 46
					replace nostateincometax = 1 if statefips == 32
					replace nostateincometax = 1 if statefips == 12
					replace nostateincometax = 1 if statefips == 2
					
				gen stateincometax = 0
					replace stateincometax = 1 if nostateincometax == 0
		
			order year zipcode 
			sort zipcode year
		*zipcode level
			save outputs\SOIwork3.dta, replace	
		
		*county level
			use outputs\SOIwork3.dta, clear
				rename zipcode zip
				merge m:1 zip using inputs\SOI\ZipToFips.dta
					order year zip countyfips
				
				collapse (mean) pnenergy nenergy0t25k nenergy25t50k nenergy50t75k nenergy75t100k nenergy100t200k nenergy200kplus ///
								pprep prep50kplus prep0t25k prep25t50k prep50t75k prep75t100k prep100t200k prep200kplus prep100kplus ///
								pnrestate nrestate0t25k nrestate25t50k nrestate50t75k nrestate75t100k nrestate100t200k nrestate200kplus ///
								pnitemized nitemized0t25k nitemized25t50k nitemized50t75k nitemized75t100k nitemized100t200k nitemized200kplus ///
								pnumdep pmars2 p0t25k p25t50k p50t75k p75t100k p100t200k p200kplus p100kplus statemean=statefips (rawsum) combinetot=tot (count) zipcount=zip (median) statemedian=statefips [pweight=tot], by(year countyfips)
					gen tot = combinetot 
					gen stateprob = (statemean!=statemedian)			
						order year statemean statemedian stateprob
						sort stateprob
					rename statemedian statefips
				*soi at county level (id is "countyfip", which is a 5 character text variable. first 2 characters are state fip, remaining 3 are county fip. 
			save outputs\SOICounty.dta, replace
		
		*state level
			use outputs\SOIwork3.dta, clear	
				order year zip statefips

				*I need this to do the 1.g.ii robustness tests
				gen prep0t50k = ((prep0t25k * number0t25k) + (prep25t50k * number25t50k)) / (number0t25k + number25t50k)
				
				collapse (mean) pnenergy nenergy0t25k nenergy25t50k nenergy50t75k nenergy75t100k nenergy100t200k nenergy200kplus ///
								pprep prep0t50k prep50kplus prep0t25k prep25t50k prep50t75k prep75t100k prep100t200k prep200kplus prep100kplus ///
								pnrestate nrestate0t25k nrestate25t50k nrestate50t75k nrestate75t100k nrestate100t200k nrestate200kplus ///
								pnitemized nitemized0t25k nitemized25t50k nitemized50t75k nitemized75t100k nitemized100t200k nitemized200kplus ///
								pnumdep pmars2 p0t25k p25t50k p50t75k p75t100k p100t200k p200kplus p100kplus (rawsum) combinetot=tot (count) zipcount=zip [pweight=tot], by(year statefips)				
					gen tot = combinetot 

				*soi at state level (id is two digit "statefips")
				rename statefips statefip
			save outputs\SOIState.dta, replace	
			
************************************************************************************************************************************************************************************************
************************************************************************************************************************************************************************************************
************************************************************************************************************************************************************************************************
********************* Analysis *************************************************************************************************************************************************************
************************************************************************************************************************************************************************************************
************************************************************************************************************************************************************************************************
************************************************************************************************************************************************************************************************
	
	*******************************************************************************************************************************************************
	*energy efficient AC***********************************************************************************************************************************	
	*******************************************************************************************************************************************************
	
	
	
	cd "C:\Users\paulmadsen\UFL Dropbox\Paul Madsen\Data2\Tax energy"
	
	
	
		use outputs\EnergyStarStateYear.dta, clear		
		
		*event time: the law came into effect for 2006 tax year
			gen et = .
			*pre period
			replace et = 1 if year == 2004
			replace et = 1 if year == 2003
			replace et = 1 if year == 2002
			replace et = 1 if year == 2001
			replace et = 1 if year == 2000
			*post period
			replace et = 2 if year == 2005
			replace et = 2 if year == 2006
			replace et = 2 if year == 2007
			replace et = 2 if year == 2008
			replace et = 2 if year == 2009	
			
			gen postdum = (et == 2)
			
			gen prepost = .
				* 1999 is for the pre period from 2000 to 2003 
					replace prepost = 1999 if year > 1999 & year < 2004
				
				* 2004 is for the baseline pre-period (must be pre 2005 because news of the act was first released in 2005) 
					replace prepost = 2004 if year == 2004

				* 2005 is for the implementation years 2005 to 2007
					replace prepost = 2005 if year == 2005
					replace prepost = 2005 if year == 2006
					replace prepost = 2005 if year == 2007
				* 2008 is for the late post period 2008 and 2009
					replace prepost = 2008 if year == 2008
					replace prepost = 2008 if year == 2009
				tabstat year, s(mean median min max) by(prepost)
			
		*add ipums variables
			*IPUMS home ownership
				merge 1:1 statefip year using outputs\TaxIPUMS_HO_sy.dta
					drop if _merge == 2
					drop homeownN
					drop _merge
			
			*IPUMS occupations
				merge 1:1 statefip year using outputs\TaxIPUMSstate.dta
					drop if _merge == 2
					drop _merge

		*I identified high prep states and low prep states by averaging prep measure over 2005-2007
			*IPUMS taxprep high/low cutoffs 
				*calculate percentile cutoffs in 2005-2007. 
					*calculate state-level mean values for occ which I will compare against these cuttoffs to determine high/low occ
					foreach occ in taxprep taxexamine accountants lawyers managers homedepot {
						bysort statefip: egen `occ'm = mean(`occ') if year > 2004 & year < 2008
						bysort statefip: egen `occ'mm = max(`occ'm)
							drop `occ'm

						egen `occ'b90 = pctile(`occ'mm) if year == 2005, p(90) 
						egen `occ'bb90 = max(`occ'b90)
						drop `occ'b90
						tabstat `occ'bb*, s(mean)
						
						gen h`occ'90 = (`occ'mm > `occ'bb90)
					}
							
		*make a scaled down version of pop
			gen popmillion = pop/1000000

		*Table 1
			tabstat atypeAC htaxprep90 sy_homeownership ipmsp25t50k ipmsp50t75k ipmsp75t100k ipmsp100t200k ipmsp200kplus popmillion if year > 2001 & year < 2010 & popmillion != ., s(n mean median min max sd) columns(statistics)
		
		*Table 2
			reghdfe atypeAC i.htaxprep90##i.et sy_homeownership ipmsp25t50k ipmsp50t75k ipmsp75t100k ipmsp100t200k ipmsp200kplus popmillion if year > 2001 & year < 2010, absorb(statefip) vce(robust)
					eststo rrr_2per
					*rvfplot
			reghdfe atypeAC i.htaxprep90##b2004.prepost sy_homeownership ipmsp25t50k ipmsp50t75k ipmsp75t100k ipmsp100t200k ipmsp200kplus popmillion if year > 2001 & year < 2010, absorb(statefip) vce(robust)
					eststo rrr_4per
					*rvfplot
			reghdfe atypeAC i.htaxprep90##b2004.prepost ///
				i.htaxexamine90##b2004.prepost ///
				i.haccountants90##b2004.prepost ///
				i.hlawyers90##b2004.prepost ///
				i.hmanagers90##b2004.prepost ///
				i.hhomedepot90##b2004.prepost ///
				sy_homeownership ipmsp25t50k ipmsp50t75k ipmsp75t100k ipmsp100t200k ipmsp200kplus popmillion if year > 2001 & year < 2010, absorb(statefip) vce(robust)
					eststo rrr_occs	
				
			estout rrr*, cells(b(star fmt(4)) se(par fmt(4))) starlevels(* .1 ** .05 *** .01) stat(r2 N, fmt(3 0 0)) varwidth(30) ///
				drop(1.e* 0.h* *2004.prepost 1.htaxprep90#1.et 1.htaxprep90) ///
				order(	1.htaxprep90#2.et 2.et  ///
						1.htaxprep90#2008.prepost 1.htaxprep90#2005.prepost 1.htaxprep90#1999.prepost ///
						2008.prepost 2005.prepost 1999.prepost )
		
			*Figure 1
				*Above, I calculated only the 90th percentile cutoff. Now I need others for graph
				*IPUMS taxprep high/low cutoffs 
					*calculate percentile cutoffs in 2005-2007. 
						*calculate state-level mean values for occ which I will compare against these cuttoffs to determine high/low occ
						forvalues i = 50(10)80 {
							egen taxprepb`i' = pctile(taxprepmm) if year == 2005, p(`i') 
							egen taxprepbb`i' = max(taxprepb`i')
							drop taxprepb`i'
							tabstat taxprepbb*, s(mean)
							
							gen htaxprep`i' = (taxprepmm > taxprepbb`i')
						}
					
				*Graph as a percentage difference relative to 2004. 					
					forvalues i = 50(10)90 {	
						*get the 2004 value for "high" states
							gen ACh`i'_2004 = atypeAC if year == 2004 & htaxprep`i' == 1
							egen AChigh`i'_2004 = mean(ACh`i'_2004)
						*get the percentage difference from the 2004 value only for "high" states
							gen AChP`i' = (atypeAC - AChigh`i'_2004) / AChigh`i'_2004 if htaxprep`i' == 1
						*get the average of this difference for the year
							bysort year: egen AChighP`i' = mean(AChP`i') 
						
						*repeat for "low" states
							gen ACl`i'_2004 = atypeAC if year == 2004 & htaxprep`i' != 1
							egen AClow`i'_2004 = mean(ACl`i'_2004)
							gen AClP`i' = (atypeAC - AClow`i'_2004) / AClow`i'_2004 if htaxprep`i' != 1
							bysort year: egen AClowP`i' = mean(AClP`i') 
					}

					local schemem = "s2manual"
					twoway (line AChighP50 AClowP50 year if abbr == "AL", xline(2004) lwidth(.4 .4)), scheme(`schemem') name(AC50, replace) legend(label(1 "High Preparers") label(2 "Low Preparers") size(medsmall)) title("High/Low Cutoff 50", size(medium)) xlabel(, labsize(vsmall)) ylabel(-.8(.4).8,labsize(small)) xtitle("") yscale(alt) graphregion(color(white)) bgcolor(white)
					twoway (line AChighP60 AClowP60 year if abbr == "AL", xline(2004) lwidth(.4 .4)), scheme(`schemem') name(AC60, replace) legend(label(1 "High Preparers") label(2 "Low Preparers") size(medsmall)) title("High/Low Cutoff 60", size(medium)) xlabel(, labsize(vsmall)) ylabel(-.8(.4).8,labsize(small)) xtitle("") yscale(alt) graphregion(color(white)) bgcolor(white)
					twoway (line AChighP70 AClowP70 year if abbr == "AL", xline(2004) lwidth(.4 .4)), scheme(`schemem') name(AC70, replace) legend(label(1 "High Preparers") label(2 "Low Preparers") size(medsmall)) title("High/Low Cutoff 70", size(medium)) xlabel(, labsize(vsmall)) ylabel(-.8(.4).8,labsize(small)) xtitle("") yscale(alt) graphregion(color(white)) bgcolor(white)
					twoway (line AChighP80 AClowP80 year if abbr == "AL", xline(2004) lwidth(.4 .4)), scheme(`schemem') name(AC80, replace) legend(label(1 "High Preparers") label(2 "Low Preparers") size(medsmall)) title("High/Low Cutoff 80", size(medium)) xlabel(, labsize(vsmall)) ylabel(-.8(.4).8,labsize(small)) xtitle("") yscale(alt) graphregion(color(white)) bgcolor(white)
					twoway (line AChighP90 AClowP90 year if abbr == "AL", xline(2004) lwidth(.4 .4)), scheme(`schemem') name(AC90, replace) legend(label(1 "High Preparers") label(2 "Low Preparers") size(medsmall)) title("High/Low Cutoff 90", size(medium)) xlabel(, labsize(vsmall)) ylabel(-.8(.4).8,labsize(small)) xtitle("") yscale(alt) graphregion(color(white)) bgcolor(white)

					grc1leg AC50 AC60 AC70 AC80 AC90, scheme(`schemem') legendfrom(AC50) pos(12) graphregion(color(white))

						
	*******************************************************************************************************************************************************
	************ SOI **************************************************************************************************************************************
	*******************************************************************************************************************************************************
		*start with a zip code analysis
		clear
		clear matrix
		clear mata
		set maxvar 50000
			use outputs\SOIwork3.dta, clear
				
				*add county identifiers for fixed effects
					gen zip = zipcode
					merge m:1 zip using inputs\SOI\ZipToFips.dta
						drop if _merge == 2
						drop _merge
						
				*Rescale tot to be in thousands
					replace tot = tot / 1000
					
				*make sure key variables (pnenergy, pprep, pnrestate) are set to missing when missing
				tabstat pnenergy pprep pnrestate pnitemized p25t50k p50t75k p75t100k p100t200k p200kplus tot, by(year) s(mean)
					foreach name in pnenergy nenergy0t25k nenergy0t25k nenergy25t50k nenergy50t75k nenergy75t100k nenergy100t200k nenergy200kplus {
						replace `name' = . if year < 2007
						replace `name' = . if year == 2008
					}
					
					replace pprep = . if year < 2004
					
					foreach name in pnrestate nrestate0t25k nrestate0t25k nrestate25t50k nrestate50t75k nrestate75t100k nrestate100t200k nrestate200kplus {
						replace `name' = . if year < 2007
						replace `name' = . if year == 2008
					}
					
					foreach name in pnitemized nitemized0t25k nitemized0t25k nitemized25t50k nitemized50t75k nitemized75t100k nitemized100t200k nitemized200kplus {
						replace `name' = . if year < 2004
						replace `name' = . if year == 2005
						replace `name' = . if year == 2008
					}
					
				tabstat pnenergy pprep pnrestate pnitemized p0t25k p25t50k p50t75k p75t100k p100t200k p200kplus tot, by(year) s(mean)			
				tabstat pnenergy pprep pnrestate pnitemized p0t25k p25t50k p50t75k p75t100k p100t200k p200kplus tot, by(year) s(n)
				
				*this makes a dummy for every state/year combo. So that isolates this to zip-level variation in tax preparers. 
					tostring year, gen(tyear)
					gen stateyear = state + tyear
					egen stateyeari = group(stateyear)
						replace stateyeari = . if state == ""
				*same for county
					gen countyyear = countyfips + tyear
					egen countyyeari = group(countyyear)
						replace countyyeari = . if countyfips == ""

				*make percentile measures of high pprep
					*calculate zip-level mean values for pprep which I will compare against these cuttoffs to determine high/low pprep
					*pprepmm is the pprep value for a zip code in 2007 but I've got it available to me in the array for all years
						bysort zipcode: egen pprepm = mean(pprep) if year == 2007
						bysort zipcode: egen pprepmm = max(pprepm)
							drop pprepm
							
					*this takes zip code level mean values of pprep and, within states, calculates percentile cutoffs so I can determine which zip codes within a state have high pprep compared to other zipcodes within the state
						bysort statefips: egen pprepb90 = pctile(pprepmm) if year == 2007, p(90) 
						bysort statefip: egen pprepbb90 = max(pprepb90)
						drop pprepb90
					*Make the dummies
						gen hpprep90 = (pprepmm > pprepbb90)
						replace hpprep90 = . if pprep == .
						gen ppreph90 = hpprep90
				
				*clean up singleton observations				
				gen sub = 1
				gen subb = 1
				gen sub_early = 1
										
					forvalues i = 1(1)2 {
						foreach fee in statefips countyfips zipcode stateyeari countyyeari {
							qui: reghdfe pnenergy pprep pnrestate pnitemized p25t50k p50t75k p75t100k p100t200k p200kplus tot if year < 2017 & sub == 1, absorb(`fee') vce(robust) resid
								predict resid, residuals
								replace sub = 0 if resid == .
								tabstat sub if sub == 1, s(n mean median min max)
								drop resid
							
							qui: reghdfe pnenergy pprep pnrestate pnitemized p25t50k p50t75k p75t100k p100t200k p200kplus tot if year < 2011 & sub_early == 1, absorb(`fee') vce(robust) resid
								predict resid, residuals
								replace sub_early = 0 if resid == .
								tabstat sub_early if sub_early == 1, s(n mean median min max)
								drop resid
						}
					}
						
			*Table 3
				tabstat pnenergy hpprep90 pnrestate pnitemized p0t25k p25t50k p50t75k p75t100k p100t200k p200kplus tot ///
					if year < 2016 & sub == 1, s(n mean median min max sd) columns(statistics)
			
			*Figure 2 work
				*make graphs comparing high pprep to others
					*Above, I only made a dummy for the 90th percentile cutoff. Here I add the others I need for the graph

						forvalues i=50(10)80 {
							*this takes zip code level mean values of pprep and, within states, calculates percentile cutoffs so I can determine which zip codes within a state have high pprep compared to other zipcodes within the state
								bysort statefips: egen pprepb`i' = pctile(pprepmm) if year == 2007, p(`i') 
								bysort statefip: egen pprepbb`i' = max(pprepb`i')
								drop pprepb`i'
							*Make the dummies
								gen hpprep`i' = (pprepmm > pprepbb`i')
								replace hpprep`i' = . if pprep == .
								gen ppreph`i' = hpprep`i'
						}

				*percentage difference in usage of the credit relative to 2007. 
					forvalues i = 50(10)90 {	
						*get the 2007 value for credit usage for "high" zips
							gen h`i'_2007 = pnenergy if year == 2007 & hpprep`i' == 1
							egen high`i'_2007 = mean(h`i'_2007)
						*get the percentage difference from the 2007 value only for "high" zips
							gen hP`i' = (pnenergy - high`i'_2007) / high`i'_2007 if hpprep`i' == 1
						*get the average of this difference for the year
							bysort year: egen highP`i' = mean(hP`i') 
						*repeat for "low" zips
							gen l`i'_2007 = pnenergy if year == 2007 & hpprep`i' != 1
							egen low`i'_2007 = mean(l`i'_2007)
							gen lP`i' = (pnenergy - low`i'_2007) / low`i'_2007 if hpprep`i' != 1
							bysort year: egen lowP`i' = mean(lP`i') 
					}
					
					local schemem = "s2manual"
					*Only include the oservations for a single zip code (because the way I've calculated it, the numbers will just repeat in every zip code and be redundant in the graphs otherwise)
					twoway (line highP50 lowP50 year if zipcode == 1535 & year >= 2007 & year < 2016, lwidth(.4 .4)), scheme(`schemem') name(c50, replace) legend(label(1 "High Preparers") label(2 "Low Preparers") size(medsmall)) title("High/Low Cutoff 50", size(medium)) xlabel(2007(2)2015, labsize(vsmall)) ylabel(-1(1)3,labsize(small)) xtitle("") yscale(alt) graphregion(color(white)) bgcolor(white)
					twoway (line highP60 lowP60 year if zipcode == 1535 & year >= 2007 & year < 2016, lwidth(.4 .4)), scheme(`schemem') name(c60, replace) legend(label(1 "High Preparers") label(2 "Low Preparers") size(medsmall)) title("High/Low Cutoff 60", size(medium)) xlabel(2007(2)2015, labsize(vsmall)) ylabel(-1(1)3,labsize(small)) xtitle("") yscale(alt) graphregion(color(white)) bgcolor(white)
					twoway (line highP70 lowP70 year if zipcode == 1535 & year >= 2007 & year < 2016, lwidth(.4 .4)), scheme(`schemem') name(c70, replace) legend(label(1 "High Preparers") label(2 "Low Preparers") size(medsmall)) title("High/Low Cutoff 70", size(medium)) xlabel(2007(2)2015, labsize(vsmall)) ylabel(-1(1)3,labsize(small)) xtitle("") yscale(alt) graphregion(color(white)) bgcolor(white)
					twoway (line highP80 lowP80 year if zipcode == 1535 & year >= 2007 & year < 2016, lwidth(.4 .4)), scheme(`schemem') name(c80, replace) legend(label(1 "High Preparers") label(2 "Low Preparers") size(medsmall)) title("High/Low Cutoff 80", size(medium)) xlabel(2007(2)2015, labsize(vsmall)) ylabel(-1(1)3,labsize(small)) xtitle("") yscale(alt) graphregion(color(white)) bgcolor(white) 
					twoway (line highP90 lowP90 year if zipcode == 1535 & year >= 2007 & year < 2016, lwidth(.4 .4)), scheme(`schemem') name(c90, replace) legend(label(1 "High Preparers") label(2 "Low Preparers") size(medsmall)) title("High/Low Cutoff 90", size(medium)) xlabel(2007(2)2015, labsize(vsmall)) ylabel(-1(1)3,labsize(small)) xtitle("") yscale(alt) graphregion(color(white)) bgcolor(white)

					*Figure 2
					grc1leg c50 c60 c70 c80 c90, scheme(`schemem') title("Usage of Energy Tax Credit", size(medium)) legendfrom(c50) pos(12) graphregion(color(white))
			
			*Figure 3
				eststo clear
				foreach fe in statefips countyfips zipcode stateyeari countyyeari {
					ppmlhdfe pnenergy i.hpprep90##ib(2007).year pnrestate pnitemized p25t50k p50t75k p75t100k p100t200k p200kplus tot if year < 2016 & sub == 1, absorb(`fe') vce(robust) eform d
						eststo p_p_`fe'
				}					
				estout p_p_*, cells(b(star fmt(2)) se(par fmt(2))) starlevels(* .1 ** .05 *** .01) stat(r2_p N, fmt(3 0 0)) varwidth(30) order(1.hpprep*year) drop(0.* 2007.* *2007.year) eform
				
				set scheme s2manual
				coefplot 	(p_p_statefips, label(State FEs) ciopts(lwidth(*.5 *1.5 *3) lcolor(*.3 *.6 *1)) msymbol(s) mfcolor(white) graphregion(color(white)) bgcolor(white)) ///
							(p_p_countyfips, label(County FEs) ciopts(lwidth(*.5 *1.5 *3) lcolor(*.3 *.6 *1)) msymbol(t) mfcolor(white) graphregion(color(white)) bgcolor(white)) ///
							(p_p_zipcode, label(Zip Code FEs) ciopts(lwidth(*.5 *1.5 *3) lcolor(*.3 *.6 *1)) msymbol(o) mfcolor(white) graphregion(color(white)) bgcolor(white)) ///
							(p_p_stateyeari, label(State*Year FEs) ciopts(lwidth(*.5 *1.5 *3) lcolor(*.3 *.6 *1)) msymbol(|) mcolor(gray) graphregion(color(white)) bgcolor(white)) ///
							(p_p_countyyeari, label(County*Year FEs) ciopts(lwidth(*.5 *1.5 *3) lcolor(*.3 *.6 *1)) msymbol(X) mcolor(gray) graphregion(color(white)) bgcolor(white)), ///
					eform nolabels keep(1.hpprep90#*) xline(1) levels(99 95 90) legend(rows(2) span position(12)) ///
					coeflabels( 1.hpprep90#2009.year = "TopDecilePreparer*2009" ///
								1.hpprep90#2010.year = "TopDecilePreparer*2010" ///
								1.hpprep90#2011.year = "TopDecilePreparer*2011" ///
								1.hpprep90#2012.year = "TopDecilePreparer*2012" ///
								1.hpprep90#2013.year = "TopDecilePreparer*2013" ///
								1.hpprep90#2014.year = "TopDecilePreparer*2014" ///
								1.hpprep90#2015.year = "TopDecilePreparer*2015")

	**********************************************************************************	
	**Compare measures of tax prep****************************************************
	**********************************************************************************
	**********************************************************************************		
		*make a two digit state to state fips crosswalk
		*I need this to merge state-level chetty (two dig state abbrev) to IPUMS state level (statefip)
		use outputs\EnergyStarStateYear.dta, clear
			egen keeper = tag(abbr)
			drop if keeper == 0
			keep abbr state statefip
			
			rename state statelong 
			rename abbr state 
			
			save inputs\Chetty\Paul_statefip_to_abbr_crosswalk.dta, replace
		
		*Import chetty measures of bunching
		*Unit is three digit zip-year (zip3 is integer)
		use inputs\Chetty\sharpbunching.dta, clear
			*the two bunching measures are b_zip3 and b_zip3_selfemp_only
			pwcorr b_zip3 b_zip3_selfemp_only, sig
			
			*make a state-level version of this crosswalk
				merge m:1 zip3 using inputs\Chetty\zip3_state_crosswalk.dta

				collapse (mean) b_zip3 b_zip3_selfemp_only [pweight=count], by(tax_yr state)
				rename tax_yr year

				save inputs\Chetty\sharpbunching_Paul_StateLevel.dta, replace
								
		*make a three digit zip code measure of tax prep from the SOI and merge chetty onto it. 
			use outputs\SOIwork3.dta, clear
				tostring zipcode, gen(zipstr)
				gen zip3s = substr(zipstr,1,strlen(zipstr) - 2)
				destring zip3s, gen(zip3)
			
				collapse 	(mean) pprep pnenergy prep0t25k prep25t50k prep50t75k prep75t100k prep100t200k prep200kplus prep100kplus ///
							p0t25k p25t50k p50t75k p75t100k p100t200k p200kplus ///
							(rawsum) combinetot=tot (count) zipcount=zipcode [pweight=tot], by(year zip3)				

				drop if year < 2004
								
				rename year tax_yr
				
				merge 1:1 tax_yr zip3 using inputs\Chetty\sharpbunching.dta
					drop _merge
			
				*there are some different "states" in the chetty data. I drop them.
					merge m:1 zip3 using inputs\Chetty\zip3_state_crosswalk.dta

					drop if state == "DC"
					drop if state == "PR"
					drop if state == "GU"
					drop if state == "IRS"
					drop if state == "notinuse"
					drop if state == "military"
					drop if state == "VI"

				*Make decile dummies
					gen b_zip3_se = b_zip3_selfemp_only
						foreach name in pprep b_zip3 b_zip3_se {
							bysort state tax_yr: egen `name'bb90 = pctile(`name'), p(90) 
							gen k`name'90 = (`name' > `name'bb90)
							replace k`name'90 = . if `name' == .
							gen j`name'90 = k`name'90 if tax_yr == 2007
							bysort zip3: egen h`name'90 = max(j`name'90)
						}										
					sort zip3 tax_yr
				
				*Table 4 Panel A
					pwcorr hpprep90 hb_zip390 hb_zip3_se90 if tax_yr == 2007 & pprep != . & b_zip3 != ., sig obs
				
		*now state level
			use outputs\SOIState.dta, clear		
			
			merge 1:1 year statefip using outputs\TaxIPUMSstate.dta
			
			tabstat pnenergy pprep taxprep, by(year) s(mean)
				replace pprep = . if year == 1998
				replace pprep = . if year == 2001
				replace pprep = . if year == 2002
				
				replace taxprep = . if year < 2000
				
				replace pnenergy = . if year < 2007
				replace pnenergy = . if year == 2008
			tabstat pnenergy pprep taxprep, by(year) s(mean)
				
				drop _merge
				
			*now merge on IPUMS home ownership
				merge 1:1 statefip year using outputs\TaxIPUMS_HO_sy.dta
					drop _merge
			
			*merge on crosswalk so I can merge chetty
				merge m:1 statefip using inputs\Chetty\Paul_statefip_to_abbr_crosswalk.dta
					drop _merge
					
					egen keeper = tag(year state)
					drop if state == ""
					sum keeper
					sort state year
					drop if state == "PR"
					sum keeper
					drop keeper

			*now merge on chetty
				merge 1:1 year state using inputs\Chetty\sharpbunching_Paul_StateLevel.dta
					drop _merge
					
					egen keeper = tag(year statefip)
					drop keeper
					
			*now merge on ac sales
				gen abbr = state
				merge 1:1 year abbr using outputs\EnergyStarStateYear.dta
					drop _merge

					egen keeper = tag(year statefip)
					sum keeper
					drop keeper

				gen popmillion = pop / 1000000
				
				replace taxprep = taxprep * 1000

				*unemployment rate calculated from IPUMS 
				gen urate = unemployed / (unemployed + employed)
					sum urate

				gen nostateincometax = 0
					replace nostateincometax = 1 if statefip == 56
					replace nostateincometax = 1 if statefip == 53
					replace nostateincometax = 1 if statefip == 48
					replace nostateincometax = 1 if statefip == 46
					replace nostateincometax = 1 if statefip == 32
					replace nostateincometax = 1 if statefip == 12
					replace nostateincometax = 1 if statefip == 2
			
				gen stateincometax = 0
					replace stateincometax = 1 if nostateincometax == 0
					
				gen incthousands = ftotinc / 1000

				*there are some different "states" in the chetty data. I drop them
				drop if state == "DC"
				drop if state == "PR"
				drop if state == "GU"
				drop if state == "IRS"
				drop if state == "notinuse"
				drop if state == "military"
				drop if state == "VI"
				
				*I have to deal with North Dakota in the prep-post regs because it is missing from the pre-period
					replace pprep = . if statefip == 38 & year > 2005
										
				*Make decile dummies
					gen b_zip3_se = b_zip3_selfemp_only
						foreach occ in taxprep pprep b_zip3 b_zip3_se {
							bysort statefip: egen `occ'm = mean(`occ') if year > 2004 & year < 2008
							bysort statefip: egen `occ'mm = max(`occ'm)
								drop `occ'm

							egen `occ'b90 = pctile(`occ'mm) if year == 2005, p(90) 
							egen `occ'bb90 = max(`occ'b90)
							drop `occ'b90
							tabstat `occ'bb*, s(mean)
							
							gen h`occ'90 = (`occ'mm > `occ'bb90)
						}								
					sort state year

				*Table 4 Panel B
					pwcorr htaxprep90 hpprep90 hb_zip390 hb_zip3_se90 if year == 2005, sig obs

				*get times from tests above. 
				*time from AC tests
					*event time: the law came into effect for 2006 tax year
						gen et = .
						*pre period
						replace et = 1 if year == 2004
						replace et = 1 if year == 2003
						replace et = 1 if year == 2002
						replace et = 1 if year == 2001
						replace et = 1 if year == 2000
						*post period
						replace et = 2 if year == 2005
						replace et = 2 if year == 2006
						replace et = 2 if year == 2007
						replace et = 2 if year == 2008
						replace et = 2 if year == 2009	
							
				eststo clear
				
				pwcorr stateincometax sy_homeownership urate educ age married ipmsp0t25k ipmsp25t50k ipmsp100t200k ipmsp200kplus popmillion if year == 2005, sig
				
				*Table 4 Panel C: model measures of local tax knowledge at the state level
					reghdfe htaxprep90 stateincometax sy_homeownership urate educ age married ipmsp0t25k ipmsp25t50k ipmsp100t200k ipmsp200kplus popmillion if year == 2005, noabsorb vce(robust)
						eststo mtp
					reghdfe hpprep90 stateincometax sy_homeownership urate educ age married ipmsp0t25k ipmsp25t50k ipmsp100t200k ipmsp200kplus popmillion if year == 2005, noabsorb vce(robust)
						eststo mpp
					reghdfe hb_zip390 stateincometax sy_homeownership urate educ age married ipmsp0t25k ipmsp25t50k ipmsp100t200k ipmsp200kplus popmillion if year == 2005, noabsorb vce(robust)
						eststo mz1
					reghdfe hb_zip3_se90 stateincometax sy_homeownership urate educ age married ipmsp0t25k ipmsp25t50k ipmsp100t200k ipmsp200kplus popmillion if year == 2005, noabsorb vce(robust)
						eststo mz2
					estout mtp mpp mz1 mz2, cells(b(star fmt(4))) starlevels(* .1 ** .05 *** .01) stat(r2 N, fmt(3 0 0)) varwidth(30)
					estout mtp mpp mz1 mz2, cells(b(star fmt(4)) se(par fmt(4))) starlevels(* .1 ** .05 *** .01) stat(r2 N, fmt(3 0 0)) varwidth(30)
						
				*Table 5
					reghdfe atypeAC i.et##c.htaxprep90 sy_homeownership ipmsp25t50k ipmsp50t75k ipmsp75t100k ipmsp100t200k ipmsp200kplus popmillion if year > 2001 & year < 2010, absorb(state) vce(robust)
						eststo ttt_taxprep
					reghdfe atypeAC i.et##c.hpprep90 sy_homeownership ipmsp25t50k ipmsp50t75k ipmsp75t100k ipmsp100t200k ipmsp200kplus popmillion if year > 2001 & year < 2010 & statefip != 38, absorb(state) vce(robust)
						eststo ttt_pprep
					reghdfe atypeAC i.et##c.hb_zip390 sy_homeownership ipmsp25t50k ipmsp50t75k ipmsp75t100k ipmsp100t200k ipmsp200kplus popmillion if year > 2001 & year < 2010, absorb(state) vce(robust)
						eststo ttt_chetty1
					reghdfe atypeAC i.et##c.hb_zip3_se90 sy_homeownership ipmsp25t50k ipmsp50t75k ipmsp75t100k ipmsp100t200k ipmsp200kplus popmillion if year > 2001 & year < 2010, absorb(state) vce(robust)
						eststo ttt_chetty2
					
					reghdfe atypeAC i.et##c.htaxprep90 i.et##c.hb_zip390 sy_homeownership ipmsp25t50k ipmsp50t75k ipmsp75t100k ipmsp100t200k ipmsp200kplus popmillion if year > 2001 & year < 2010, absorb(state) vce(robust)
						eststo ttt_taxprepchetty1
					reghdfe atypeAC i.et##c.hpprep90 i.et##c.hb_zip390 sy_homeownership ipmsp25t50k ipmsp50t75k ipmsp75t100k ipmsp100t200k ipmsp200kplus popmillion if year > 2001 & year < 2010 & statefip != 38, absorb(state) vce(robust)
						eststo ttt_pprepchetty1

					estout ttt*, cells(b(star fmt(4))) starlevels(* .1 ** .05 *** .01) stat(r2 N, fmt(3 0 0)) varwidth(30) drop(1.* htaxprep90 hpprep90 hb_zip390 hb_zip3_se90) order(2.et#*)
					estout ttt*, cells(b(star fmt(4)) se(par fmt(4))) starlevels(* .1 ** .05 *** .01) stat(r2 N, fmt(3 0 0)) varwidth(30) drop(1.* htaxprep90 hpprep90 hb_zip390 hb_zip3_se90) order(2.et#*)
			

						
